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[Excerpts] 


Abstract 


A new neural network model, a general-purpose master- 
slave neural network model, is presented. The general- 
purpose nature of this model is proven by two master- 
slave control methods. 


Key words: master-slave neural network, energy func- 
tion, local minimum, system stability. 


I. Introduction 


Just as with earlier automatons, existing artificial neural 
networks are dedicated systems (counting special sym- 
bols for example). General-purpose computers began to 
emerge after the concept of “stored program” was intro- 
duced. The program determines the sequence of infor- 
mation processing. If this concept is applied to a neural 
network, we will find that its function is determined by 
the architecture of the network (which determines the 
processing sequence of network information) and degree 
of excitation of neuron. To this end, the key step in the 
design of a general-purpose neural network is to build a 
master control module. Its output can be used to regulate 
the weighted value of the slave module, or to “clamp” 
the degree of excitation of the slave module. The struc- 
ture of the main module is “programmed” to define its 
function. Figure | shows a simple master-slave neural 
network we have designed. The master module controls 
the weighted value of the slave module by way of several 
intermediate channels. The topological architecture of 
the slave module is arbitrary. (The slave network shown 
in the figure has a layered structure.) 


The general-purpose master-slave neural network pro- 
posed can be controlled in two different methods. (1) 
The output of the master network controls the degree of 
excitation of slave neural network elements in the form 
of an external current. (2) The master network output 
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Figure 1. A Simple Master-Slave Neural Network 
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regulates the link right of the slave network. In reality, 
(1) and (2) may be combined in use. 


Specifically with reference to these two basic methods, 
two master-s!ave models are designed and simulated on 
a computer. 


II. General-Purpose Master-Slave Model With Link 
Weight Regulation 


A BP network is essentially a nonlinear mapping 
function:! 


Y=F,(W, e F,(W,r000eX)) ( 1 ) 


where X and Y are the input and output vector of the 
network, and W, is the link weight matrix between the jth 
and j + Ith layer. 


The shortcoming of the present BP [backpropagation] 
network is its slow learning process. It is difficult to use 
hardware to reflect the change in the weight. Further- 
more, the learning algorithm essentially is a nonlinear 
optimization process. There is not a good method for 
overcoming difficulties associated with local minima. 
Specifically in response to these problems, the network 
shown in Figure 2 is introduced. A Hopfield network 
with m x n neurons is used to replace the weight matrix 
Winx n for every layer of the BP network. The output of 
each neuron V,; represents weight W,, of the original BP 
network. Since the nature of a Hopfield network is that 
the energy function of the network is minimized when it 
converges,” it is possible to utilize this behavior of the 
Hopfield network to help converge the BP network if the 
error function of the BP network is treated as the energy 
function of the Hopfield network. [passage omitted] 











Hidden layer 
Multi- 
[c: Output 


Multiplier array 






—e 


x 
IF 


ny 










deal output 
Figure 2. General-Purpose Master-Slave Neural Net- 
work With Tunable Weighted Interconnection 





A control layer is added on top of the Hopfield layer. A 
bias current is injected to cause the error function of the 
BP network to jump at the local minimum. This tech- 
nique is used in the other model discussed in this paper. 


By now, the following conclusions can be reached for this 
model: 


(1) From equation (3), regardless of whether it is a BP 
network or Hopfield network, the weight is expressed in 
terms of the output state of the neuron. Thus, the model 
is not relevant to the actual problem and the network is 
more general-purpose in nature. It is relatively easy to 
realize a variety of nonlinear mappings. 


(2) The model can easily be constructed in hardware 
because weight link matrices can be realized by means of 
multipliers. Compared to using resistors or RAM, it is 
more flexible and accurate. 


(3) The learning process of the BP network is automati- 
cally completed under the control of the Hopfield layer. 


This model is used to solve the “mirror image symme- 
try” recognition problem presented by D. E. 
Rumelhart.” When an n-dimensional vector comprised 
of n symbols is applied to the input end of the network, 
the output should point out whether this string is sym- 
metric left to right (for example, 00111100 is a sym- 
metric string). This is a classic problem for testing the BP 
algorithm. It imposes a high degree of stability on the BP 
network. The data provided by Rumelhart show that the 
network needed to be trained approximately 90,000 
times before converging when n = 6. In our model, the 
nonlinear mapping function of the Hopfield network is 
chosen to be f(u) = 1/(1 + exp (-a u)). When 0.2 < a < 2, 
0.1 < € < 0.8 (where ¢ is the iteration step given in 
equation (4)), the network converges after 32,000-60,000 
times of training. Figure 3 shows a and € as a function of 
number of times to reach convergence. Apparently, this 
number is related to the product of a and €; when the 
product of a and ¢€ is a specific value, the number 
required to converge is minimized. The farther it is off 
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from this value, the larger the number of times required 
to converge; Or, it may not converge at all. 


@62000 times to converge 


@32000 times to converge 
@44000 times to converge 

















05 101.5 20 
Figure 3. a and ¢ Curve 





III. General-Purpose Master-Slave Model Using 
External Current Control 


In order to solve the NP problem in conventional 
optimization, 1.e., the traveling salesman problem (TSP), 
a master-slave neural network such as the one shown in 
Figure 4 has been designed. The slave network is a 
Hopfield network and the master network is a conven- 
tional computer. Based on the link matrix, an effective 
Lyapunov function has been designed. The convergence 
of the new model and the validity of the solution are 
verified by way of analysis of the Eigen value of the 
network link matrix and the network dynamic equation. 
[passage omitted] 
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Figure 4. Master-Slave Model Using External Current 
Control 





IV. Conclusions 


The master-slave models presented in this paper not only 
are capable of delineating the characteristics of the 
complicated biologic neural system (master-slave 
differentiation)® but also are to some extent general- 
purpose in nature by stressing the use of a conventional 
computer. Master-slave differentiation indicates that 
information is processed separately based on its rele- 
vance. This will reduce the communications load and 
makes it feasible to build a coarse artificial neural 
network processor. 
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[Text] 


Abstract 


A floating-gate NMOS transistor neural network has 
been designed. Its features include simple structure, 
small chip area and continuously adjustable interconnec- 
tion weight. In addition. it possesses the characteristics 
of distributed neurons and can be cascaded to form a 
large network. An 8 x 8 interconnected neuro-chip has 
been fabricated using a 3 » m floating-gate NMOS 
process. The chip contains 128 programmable floating- 
gate NMOS transistors, which 1s equivalent to a fully 
interconnected network comprised of 8 neurons. It has 
been used in number recognition and binary image 
processing. The results show that the network has a great 
deal of potential. Furthermore, it is highly flexible and 
easy to fabricate because of its simple structure and 
adaptability to IC technology 


Key words: artificial neural network, IC. 
ciple and design. 


circuits prin- 


I. Introduction 


Theoretical analysis of neural networks is still being 
conducted. Research on neural! networks must still rely 
on analog tools. Hardware support 1s required in solving 
complex problems involving large-scale parallel systems. 
Hence, a lot of interest is focused on the implementation 
of neural networks. A large number of electronic 
methods are available to implement neural networks. In 
the area of VLSI, there are digital circuits.’ analog 
circuits? digital/analog circuits,’, voltage circuits, cur- 
rent circuits,* pulse current circuits. etc. The major 
feature of a neural network 1s the formation of a large 
nonlinear dynamic system through the interconnection 
of a large number of simple elements. When a network ts 
implemented for a parallel system, the element ought to 
be simple in structure and the chip should be small in 
order to form a large-scale network. The high computing 
power and flexibility of a neural network comes mainly 
by way of adjusting the synaptic weights. It is imperative 
to be able to fabricate neural networks that have contin- 
uously adjustable weights 


The basic element of a neural network not only 1s an 
operator but also a storage device. Its complexity and 
area are determined by both operation and storage 
requirements. As far as operation is concerned, most 
conventional methods are based on theoretical algo- 
rithms and do not fully utilize the physical characteris- 
tics of the device. It is very difficult to arrive at a simple 
design. Active devices in a digital circuit behave as 
switches. A device operates in the saturation region and 
cutoff region only and its characteristics are poorly 
utilized. When the computation 1s complete. the circuit 
is complicated, the chip area 1s large and the computa- 
tion time is long. Nevertheless, this kind of circuit has a 
high level of tolerance to noise and 1s the most mature 
circuit in use. In an analog circuit, an active device 
operates in a linear region. The physical laws in the 
linear region may be utilized in performing the compu- 
tations. Compared to digital circuitry, its utilization rate 








is higher, the chip area is smaller and the computing 
speed is faster. However, the circuit is more susceptible 
to noise. Even though the accuracy of each individual 
device is sufficiently high, due to the accumulation of 
errors the final result is severely impacted by noise when 
the entire computation is completed. Therefore, it is 
difficult to make a large-scale computation system with 
this kind of circuitry. Since the correctness of a neural 
network not only depends on the accuracy of each 
individual device but also on the feedback of the net- 
work, it does not impose absolutely precise device 
parameters, rigorously matched devices and accurate 
time constants. We should be able to take more advan- 
tage of the physical characteristics of the device com- 
pared to an analog circuit (i.e., letting the transistor 
operate over its entire characteristic region) so that the 
basic computation element has the simplest structure 
and occupies the least amount of area. The storage of 
analog data at the basic element level is one of the key 
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issues in the implementation of neural networks. Its 
behavior also determines the performance of the entire 
network. The floating-gate structure is a simple and 
low-chip-area analog storage device. It consumes little 
power for erasing and writing. The charge leakage of 
floating-gate storage is also low, which is especially 
suitable for the storage of synaptic weights. 


A simple and highly flexible neuro-chip has been 
designed and fabricated based on 3 yp m floating-gate 
NMOS process. It employs the floating-gate NMOS 
device to perform the computation and storage function 
of the basic element. This not only greatly simplifies the 
programmable network but also makes it adaptable to IC 
technology. A model was built for number recognition 
and it is used as an example to demonstrate its applica- 
tion as a Hamming network. It was also used in noise 
cancellation and edge detection of binary images to 
illustrate the fact that the chip might be used in variable 
threshold circuits. 





II. Design and Implementation of Neuro-Chip 


Figure | shows the neuron of a network comprised of floating-gate NMOSFETs. Since all neurons are identical in 
structure in this network, they essentially have the same conductivity factor and parasitic capacitance. Hence, the 
interconnect strength can only be adjusted by threshold voltage. The dynamic equation describing this kind of a 
nonlinear network is as follows: 


wu, __[¥ p> 
See K( F,,(U, Ui» Vet 2a Fi Kym UoVs, | 


F,,;(U,,U;5 Fs HY nar 


{UU ,—-U;»Vo0-U is Vii) 


excited interconnect 
inhibited interconnect 


i=l, oes y N; j=1, ae) N+M 


where K is the ratio of transistor conductivity factor to 
neuron input capacitance, Vpp is the power supply 
voltage, N is the number of neurons, M is the external 
input number into the network, and f is the characteristic 
function of the floating-gate transistor. The first term 
inside the bracket on the right hand side of the above 
equation represents the network feedback and the 
second term represents the feed-forward part of the 
network. This neural network has been constructed 
based on the nonlinear characteristics of the transistor. 


Figure 2 shows the circuit of the floating-gate NMOS 
transistor neural network based on such a neuron struc- 
ture. It is comprised of three parts. The first part is the 
basic network which contains 8 x 8 basic interconnected 
elements. The second part controls the operation that 
includes M_,., Mg;., and Mj,.s. M,,.. control the inter- 
connect states. When 9 - is at a high voltage, the network 
operates in a feedback mode and when 9 ¢< is at a low 
voltage, the network operates in a non-feedback mode. 











Figure 1. Floating-Gate NMOS Neuron Structure 





M,,., clear the neuron dendrite lines (output). Mj,.s 
control the signal input. In addition, their presence 
avoids the direct connection of the grid of the neuron to 
the input leg to protect the programming in the network 
when @ , is low. The third part is a program control 
circuit that is comprised of M,,., and M,,.,. They switch 
to control the program to choose between excited or 
inhibited interconnect. When Qg, is high and 9, is low, 
the program is excitedly interconnected. When q, is low 
and @, is high, the interconnect is inhibited. 
























































JPRS-CST-93-004 
3 March 1993 
Mo Ma Ma Ma Ma Ma Ma Ma e 
— Y, 
a i 
rarities 
; AA ae 
a 
Fra 
Mu 1 + } 
Ma\M Y, 
s aM 
¥ UTS USE 4” 
1} M-lMey, 
M mid 
iaiM Y, 
Ma —: 





Ma Mes Mas, M te 7 





Figure 2. Chip Circuit 





This NMOS transistor neural network has the feature of 
a distributed neuron structure. One disadvantage of the 
conventional resistor amplifier neural network is that the 
fan-in and fan-out of the amplifier must be increased as 
the network expands. Furthermore, its ability to amplify 
the lowest signal cannot be reduced. This makes the 
design of the amplifier difficult. Hence, a distributed 
neuron network structure® was proposed. The number of 
amplifiers is increased in order to reduce the need to 
enhance the fan-in and fan-out capability of each ampli- 
fier. From a certain angle, each transistor in a transistor 
neural network accomplishes the function of a resistor 
and an amplifier in a distributed neuron network. Since 
this network possesses the features of a distributed 
neuron network, it may be conveniently cascaded into a 
larger network, or a network of a different structure. 
Thus, the adaptability and flexibility of the hardware are 
improved. In addition, this also facilitates the design of 
the neural network on the circuit board. 


In order to shorten the leads, the output lines are 
designed to be horizontal and in the chip layout. The 
input lines are perpendicular to the output lines. Each 
basic element in the network consists of two floating-gate 
NMOSFETSs. One is the excited interconnect between 
the programming electrode and the source and the other 
is the inhibited interconnect that has an independently 
programmed electrode. In order to have a symmetric 
excited and inhibited interconnect, to the extent pos- 
sible, both transistors have the same master pattern. 


The circuit on the 3 y m floating-gate NMOS chip is 2.5 
x 3.3 mm? in area. It includes 128 programmable 
NMOSFETs and over 50 peripheral transistors, as 


shown in Figure 3 [photograph not reproduced]. Results 
measured show that the breakdown voltage of transistors 
in the peripheral circuit ensure that the programmed 
voltage can be applied to the basic element. The pro- 
gramming characteristics of the basic interconnect ele- 
ment is described in reference 7. 


III. Hamming Network Constructed With Neural 
Network Chips 


The Hamming network is a double-layer network com- 
prised of a maximum network and a template-matching 
network. It can compare an input vector with existing 
template vectors and determine the matching template 
with the minimum Hamming distance. This is a 
common operation in pattern recognition. 


A beneficial result of biologic neural network research is 
the discovery of a side inhibiting effect. One of the basic 
modes of the neural network established based on this 
principle is a competitive network. The survival- 
of-the-fittest network® is a classic competitive network. 
Its function is to locate the maximum among a set of 
input data by way of network operation. The neuron 
corresponding to the maximum input has the maximum 
Output by means of competitive evolution while other 
Output values become the minimum. When imple- 
menting this function with a chip circuit, the network is 
operating in a full-feedback mode. The interconnects 
between a certain neuron and other neurons are inhib- 
ited, while the connection to itself is excited. In a 
transistor neural network, an interconnecting transistor 
not only adjusts the weight but also amplifies. The 
relative control of the output current by each transistor 
reflects the strength of the interconnect matrix. The 
absolute control of the output current is a function of the 
electrical properties (such as high-voltage or low-voltage 
Output, time delay, etc.) of the whole network. 


Another important application of the neural network is 
to compare an input vector to template vectors already 
stored in the network to calculate the degree of matching 
between the input vector and the template vector. In 
principle, this computation can be accomplished by 
three different interconnect methods.’ As for the sur- 
vival-of-the-fittest maximum network constructed by the 
chips described earlier, it is more appropriate to employ 
inhibitively interconnected elements to store the tem- 
plate-matching network. 


By connecting a survival-of-the-fittest network to a 
matching network, we have a Hamming network. The 
matching template for numbers |-8 is as shown in Figure 
4. Since the template is a 15-dimensional vector, it 
requires two 8 x 8 network chips to complete the 
computation for template similarity. Furthermore, an 8 
x 8 network chip is required to find the maximum for 
eight matching templates. Therefore, the Hamming net- 
work for number recognition requires three chips. Figure 
5 shows the circuit for the entire system. By means of 
weighted interconnection through programming, the 
three chips are weight-adjusted to be connected in the 








manner shown in the circuit. The figure does not show 
any interconnects that correspond to the case where the 
threshold voitage is higher than the source voltage 
because they are not functional in the circuit. The 
weight-adjusted circuit was linked to a computer to 
conduct number recognition tests. Figure 6 shows the 
results of a series of experiments. For every pair of 
numbers in the figure, the left one represents the input 
vector and the right one is the final recognition result. 
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IV. Variable-Threshold Logic Network Constructed 
With Chip Circuit 


The threshold logic concept was presented by McCulloch 
and Pitts? to describe neuron mathematical models and 
Boolean functions. Variable-threshold logic is threshold 
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logic with variable interconnect weight and a variable 
threshold. The threshold element is a logic element with 
a binary input and an output value of either “0” or “1.” 
Its property is determined by the weight and the 
threshold. A single-threshold element can only be used to 
implement a linear separable function. A network com- 
prised of threshold elements can be used to implement 
any arbitrary logic function. 


The transistor neuro-chip can conveniently be used as a 
variable-threshold element. Its circuit is shown in Figure 
7. The threshold voltage of inhibitively interconnected 
transistors M,,-M,, determines the weight and the 
threshold voltage of transistor M, determines the 
threshold of the element. M., ,-M... perform operations 
such as weighted input and threshold comparison. The 
Output of this threshold element is complementary. 
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Transistor Neuro-Chip as a Threshold Element 





As an example, a variable-threshold logic network has 
been used to eliminate noise from a binary image. A 3 x 
3 pixel window is moved sequentially across a binary 
image. The number of black pixels in the window at each 
position is counted. Corresponding to the new image, the 
output is black only when the number of counts of the 
window is greater than or equal to the threshold. For 
such a noise-cancellation method, it is simpler to accom- 
plish with a logic circuit. Because the logic function in 
this method is linear and separable, it can be easily 
implemented with a threshold logic circuit. 


Figure 8 shows the results of the actual treatment. Figure 
(a) shows the noisy input binary image; (b), (c) and (d) 
are the processed results as a function of decreasing 
threshold. Because the internal operation of the circuit- 
operates in analog mode, the interconnect strength can 
be programmed to be continuously tunable. Hence, the 
threshold may be determined based on the situation of 
the circuit and the degree of satisfaction with regard to 
the treatment. This is quite different from pure digital 
operations. 


As another example, a variable-threshold network 
capable of processing an inseparable logic function has 
been used to detect the edge of a binary image. Experi- 
mentally, the Laplace operator of four neighboring win- 
dows is used detect the edge of a binary image. In this 
method, when five pixels are identical, the output is 0; 
otherwise, it is |. This is a linear inseparable logic 
function and cannot be completed by a single-threshold 








JPRS-CST-93-004 
3 March 1993 


cARA © 
RAGA © 
iR A wk fed (00 n 
RASH 


Ve >t er 
Figure 8. Experimental Result of Noise Cancellation of 
Binary Images 





element. Despite the fact that this function is linear and 
inseparable in 5-dimensional space, it becomes a linear, 








separable logic function in a higher-dimensional space. 
Let X, =X, + X,+X,+X4+ X, and use X, as the sixth 
input for the threshold element. The logic function by 
now becomes linear and separable and can be dealt with 
by a single-threshold element. Figure 9 shows the circuit 
used for edge detection using this method. Theoretically, 
this algorithm can also be accomplished by a variable- 
threshold logic network, however, this requires more 
neuro-chips and is therefore not used. Figure 10 shows 
the network treatment results, where (a) is the 34 x 34 
pixel input binary image, and (b), (c) and (d) are the 
results obtained using different threshold values. From 
this edge detection example, transistor neural networks 
not only can be used to implement variable-threshold 
logic circuits but also common logic gates. Therefore, the 
circuit used to implement a logic function must be 
thoroughly considered in order to select the simplest one. 
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Figure 10. 





V. Conclusions 


A simple neuro-chip with variable interconnect weight 
has been designed and fabricated using 3 » m floating- 
gate NMOS technology. This network behaves like a 
distributed neuron structure and can be cascaded into a 
large-scale network. The main part of the circuit is an 8 
x 8 fully interconnected matrix, equivalent to an 8- 
neuron network. Weight storage is done by the floating- 
gate structure. 


By connecting several floating-gate NMOS networks, a 
Hamming network was formed for numeral recognition 
and a variable-threshold logic network was created for 
noise cancellation of binary images. A binary and vari- 
able-threshold hybrid logic circuit was also formed to 
detect edges of binary images. All results are satisfactory. 
In these applications, the interconnect weights were 
electrically written and survived numerous write/erase 


cycles. This proves that the chip has a promising future 
and exhibits a high level of flexibility in various appli- 
cations. 
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Abstract 


The optimum-learning-rate backpropagation algorithm 
is reviewed. The equations for computing the optimum 
learning rates of several commonly used networks are 
presented. Problerns associated with the implementation 
of the algorithm are discussed. The speed of the algo- 
rithm is further illustrated by experimental simulations. 


Key words:.optimum learning rate, BP algorithm, mul- 
tulayer neurai network, prediction. 


I. Introduction 


In theory, it has been demonstrated ihat a layered 
feed-forward neural network can be employed to approx- 
imate any given continuous mapping f: RN — R™ by 
properly selecting the topological architecture and the 
interconnect weight. This type of neural network plays 
an important role in artificial neural network research, 
especially in the recognition and classification of sensory 
signals such as voice and image, as well as in nonlinear 
signal processing applications such as adaptive filtering, 
adaptive control, and mapping approximation. This type 
of network model 1s often widely used as a critical part of 
a system or the entire system itself. The mapping created 
by such a network often becomes an important factor 
affecting the performance of the system. Nevertheless, 
we have not found an effective algorithm to automati- 
cally design the topologicai architecture and intercon- 
nect weight of a multilayer teed-forward network that 
provides a fixed mapping relation. In practice, it 1s 
converted to a two-stage experimental design optimiza- 
tion problem. This involves the selection of an architec- 
ture and using a learning algorithm to train the weight 
value to make the target function of the network reach an 
optimum or satisfactory level for this particular archi- 
tecture. Then, the network architecture is changed and 
the training process of the interconnect weight 1s 
repeated. Finally, all the results are compared to choose 
the best combination of architecture and weight pro- 
viding the optimum target function. The total time 
required to complete such a two-stage design process 1s 
equal to the number of network architecture chosen 
multiplied by the average time required to obtain the 
optimum or a satisfactory weight value for each archi- 
tecture. Therefore, with a given network architecture and 
initial weight, it 1s of practical significance to reduce the 
learning time required to find the weight that provides a 
satisfactory target function value. 


In practice, backpropagation (BP)! is the most popular 
algorithm for learning the weight of a multilayer feed- 
forward neural network. However, conventional BP is 
seriously impacted by the arbitrary constancy of the 
parameter selected. With a given network architecture 
and initial interconnect weight, one often needs to select 
different parameters for the algorithm and repeat the 
learning process to obtain a satisfactory solution. This 
not only significantly slows down the convergence rate of 
the design process but also diverts focus away from 
system architecture and functionality. 
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Specifically with reference to the difficulty caused by the 
arbitrary constancy in setting the parameter for back- 
propagation, a method to analyze the learning rate of an 
adaptive and improved BP algorithm is presented in 
reference 2. This method is formalized in reference 3 and 
general formulas are also provided. It essentially gener- 
ates a second-order characteristic curve of the parameter 
(i.e., learning rate p ) along the direction of fastest 
decline of the target function by local linearization of the 
nonlinear processing elements in the network and this 
curve is used to locally approximate the target function. 
The second-order characteristic curve is completely 
determined by the current iteration gradient, the error of 
the processing element in the output layer and its per- 
turbation. Its shape varies as the curvature of the target 
function at the point of interest changes. In every itera- 
tion, the optimum learning rate » * always shifts the 
weight toward the minimum of the characteristic curve 
generated in the same iteration. The purpose of having 
the learning rate adapt according to the curvature of the 
target function of the network is to overcome the diffi- 
culty associated with the “tuning of the parameter of the 
algorithm,” to improve the degree of the approximation 
of the mapping created by the network, and to speed up 
the network learning rate. This paper describes this 
algorithm and reviews its results. Problems associated 
with the implementation of the algorithm are also dis- 
cussed. Specific formulas to calculate the optimum 
learning rates in frequently used networks are provided. 
The fast nature of the algorithm is also illustrated by way 
of simulation. (The derivation and proof of the algo- 
rithm is presented in reference 4.) [passage omitted] 


IV. Computer Simulation Experiment 


The objective of the experiment is to observe any 
improvement of the optimum-learning-rate BP algo- 
rithm over the conventional BP algorithm, in terms of 
learning speed and the ability of the network to create an 
approximation of the mapping desired. Furthermore, the 
effect of the initial weight on the results is also investi- 
gated. In addition, a comparison of the cost-to-benefit 
relation of computation is also made. 


The objective of the experiment is to employ a single- 
hidden-layer feed-forward neural network to learn the 
Feigenbaum mapping produced by the following non- 
linear iteration: 


x(n+1) = rx(n)[1-x(n)], n = 0, 1, 2... (22) 


where the initial value 0 < x(0) < 1, and r is a control 
parameter. In our experiment, let x(0) = 0.0100004 and 
r = 4.0. Equation (22) generates a general mixed-time 
sequence by way of iteration. 


The purpose of using a layered feed-forward neural 
network to learn Feigenbaum mapping is to allow the 
weight of the network to begin learning from a random 
initial value. Eventually, it enables the network to pro- 
duce an output x [caret over x] (n + 1) in response to an 
input x(n) in accordance with equation (22), i.e., abso- 
lute value of: x [caret over x] (n + 1)-x(n + 1)? is held to 


a minimum. This can also be viewed as a prediction of 
the time sequence generated by equation (22) by the 
layered feed-forward neural network. In the learning 
process, each experimental data block length is T = 16, 
and each block renews an input and ideal output pair. 


The architecture of the network is chosen arbitrarily. In 
Our experiment, we selected a hidden processing element 
comprised of six Sigmoid nonlinear response functions, 
a linear input processing element, a linear Output pro- 
cessing element, and a non-zero threshold single- 
hidden-layer feed-forward network. The computation 
load for each iteration is shown in Table |. 





Table 1. Computations Required for Each Iteration for the 
(1 6 1) Network With T = 16 


Conventional BP | Optimum-learning- 
algorithm | rate BP algorithm 























Multiplication/division 853 1302 
> —+— 

Addition/subtraction $12 830 
+ -_—-—++-- 

Total | 1365 | 2132 





The experiment begins with two sets of initial weights. 
For each initial weight, let the conventional BP learning 
rates be 0.005, 0.01, 0.05 and 0.1 and initiate four 
corresponding learning processes. A total of eight 
learning processes are done for the two initial weights. 
For each learning process, the relative rms error of the 
network output (1.e., the ratio of the output error matrix 
mode to the ideal output matrix mode) is recorded as a 
function of the number of iterations involved. The 
results corresponding to two sets of initial weights are 
plotted in Figures 3(a) and 3(b), respectively. The two 
learning processes of the optimum-learning-rate algo- 
rithm associated with these two initial weights are also 
shown in Figures 3(a) and 3(b), as well. 


For group A, as shown in Figure 3(a). the four learning 
curves controlled by the four learning rates essentially 
coincide with each other using the conventional BP 
algorithm. The convergence error is approximately -5 
dB. (Experimentally, it showed no apparent improve- 
ment even when the n mber of iterations was increased 
to over |,000.) In comparison, the learning curve of the 
optimum-learning-rate BP algorithm showed significant 
drop in about 100 iterations. In the vicinity of 300 
iterations, it falls to a satisfactory level and the conver- 
gence error is under -20 dB. For group B, as shown in 
Figure 3(b), the four conventional BP curves are quite 
different. The 0.01-learning-rate curve has the best con- 
vergence rate and error. In the neighborhood of 500 
iterations, its convergence error is about -40 dB. The 
optimum-learning-rate BP algorithm 1s still better than 
the four conventional BP learning curves. In approxi- 
mately SOO iterations, the convergence error has 
declined to below -50 dB, corresponding to an improve- 
ment of mapping accuracy of 10 dB. 
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Figure 3. Comparison of Learning Process of Conven- 
tional BP Algorithm to Optimum-Learning-Rate BP 
Algorithm at Various Learning Pates; (a) and (b) are 
learning curves corresponding to initial values A and B. 











If the number of arithmetic operations required to reach 
a specified convergence error is used to judge the con- 
vergence speed, then the optimum-learn.ng-rate BP algo- 
rithm requires 5.3 x 10° arithmetic operations to reach 
-20 dB in group A. The conventional BP algorithm 
would only reach the -5 dB level after 2.7 x 10° opera- 
tions. In group B, the conventional BP algorithm and 
optimum-learning-rate BP algorithm require 2.7 x 10° 
and 3.2 x 10° operations, respectively, to reach a con- 
vergence error of -40 dB. The optimum-learning-rate BP 
algorithm saves approximately an order of magnitude of 
arithmetic operations. 


V. Discussion 


Based on an analysis of the above experimental results, 
the following conclusions can be reached. 


(1) When using the conventional BP algorithm to train a 
multilayer feed-forward network, it is very difficult to 
guess the proper parameters that produce a satisfactory 
convergence rate and convergence speed because there 
are so many factors affecting setting the parameters 
(such as the initial weight). 
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(2) Compared to conventiona' BP algorithm, the opti- 
mum-learning-rate BP algorithm significantly improves 
the convergence rate and error of the learning process. 
The experiment could not rule out the fact that there 
might be a parameter, out of numerous guesses, that 
would provide an even more satisfactory convergence 
rate and error using the conventional BP algorithm, as 
compared to the optimum-learning-rate BP algorithm. 
However, such a parameter could only be obtained by 
comparing the results of numerous learning processes 
after repetitive guessing and learning. The fine-tuning of 
the parameter is something we wish to avoid in artificial 
neural networks. 


(3) In each iteration, learning-rate optimization takes 
approximately 56 percent of the amount of computation 
with a conventional BP algorithm. However, this addi- 
tional computation load could greatly reduce the number 
of iterations and significantly improve the mapping 
accuracy to accelerate the overall learning process. 


(4) The optimum-learning-rate BP algorithm still retains 
the characteristics of a gradient algorithm. Hence, the 
selection of the initial weight remains important. Usu- 
ally, the initial value may be changed several times 
during training in order to improve the mapping char- 
acteristics generated by the network. 


These four aspects as a whole are instrumental to the fast 
nature of the optimum-learning-rate BP algorithm. 
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[Text] 


Abstract 


This paper presents a new scheme for the optical imple- 
mentation of a bipolar WTA (Winner-Take-All) triple- 
layer neural network model and its experimental results. 
In the full bipolar mode, the input to the ith neuron in 
the middle layer consists of two parts. One is the input to 
that neuron in the unipolar mode and the other is one 
half of the reverse pattern corresponding to the stored 
pattern. The constant ' can be implemented by dividing 
the input to each neuron into two equal parts, i.e., an 
Opaque and a transparent part. Experimentally, the 
bipolar system was found to have higher storage capacity 
and addressability than a unipolar WTA system. 


Key words: WTA neural network model, bipolar neural 
State, pattern recognition, multi-channel inner-product 
hologram. 


I. Introduction 


In recent years, neural networks have been a hot research 
topic worldwide. Many new models have been 
presented! and one of them is the WTA neural network 
model. It is believed that the WTA neural network is a 
mechanism in the brain.2*> The WTA neural network not 
only has a high storage capacity and addressability but 
also can implement independent association and mutual 
association.*® Furthermore, the connection weights of 
WTA are 0, | (unipolar), or +1, -1 (bipolar). A bipolar 
WTA model has larger storage capacity and higher 
addressability than its unipolar counterpart. Neverthe- 
less, because negative neuron inputs and interconnects 
are involved, it is somewhat difficult to implement 
optically. The implementation of bipolar connection has 
been reported in several studies. ’"'° However, there has 
been no report on the simultaneous implementation of 
bipolar neuron state and interconnection. Wang Xuming 
and Mu Guoguang proposed a simple scheme to imple- 
ment a bipolar Hopfield model by adding an additional 
row of elements on the interconnect mask to take advan- 
tage of a preset distributive background.'' I. Shariv used 
a birefringent crystal to split a polarized light beam into 
two orthogonally polarized beams to represent the posi- 
tive and negative neuron states and implemented a 
bipolar triple-layer network in a double-pass system. '? In 
this work, our original unipolar optoelectronic hybrid 
WTA pattern-recognition system'? has been improved 
and an optical pattern-recognition system for the imple- 
mentation of the bipolar WTA neural network is pre- 
sented. When a stored pattern contains another stored 
pattern, a unipolar system cannot accurately recognize it. 
For example, when shi [0577] and wang [3769] are both 
stored patterns, the original unipolar system cannot 
accurately judge whether the input is shi or wang when 
the input is shi. It can be recognized by a bipolar system. 
Therefore, the storage capacity and addressability of the 
network is significantly increased. 
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II. Bipolar WTA Neural Network Model 


1. WTA Neural Network Model 


Figure | shows the architecture of the WTA neural net- 
work model. It is a triple-layer neural network witha WTA 
hidden layer.“* Each neuron in the hidden layer corre- 
sponds to a stored pattern. The number of neurons is equal 
to the number of stored patterns M. The connection 
between the jth neuron in the hidden layer and the ith 
neuron in the input layer is W,;. The interconnect between 
the ith and jth neurons in the hidden layer is T,. The 
connection between the jth neuron in the hidden layer and 
the ith neuron in the output layer is W’;;, When W,; is equal 
to W’;,, the network implements homo-association based 
on content addressability. Otherwise, it implements het- 
ero-association. 








Figure 1. WTA Neural Network Model 





The input pattern is the weighted summation of W,, to 
determine the likelihood with every stored pattern. This 
likelihood is the input to the neuron in the hidden layer. 
When W;; is the component V; of the ith neuron corre- 
sponding to the jth stored pattern, the likelihood of the 
input pattern to the stored pattern is their inner product, 
1.e., the input to the jth neuron in the hidden layer is the 
inner product of the jth stored pattern and the input 
pattern. In the hidden layer, due to its connection T;,, only 
the output from the neuron of the largest input is non-zero. 
The rest of them are all 0 and that completes the WTA 
operation. When the inner products between two or more 
patterns and the input pattern are equal, the network 
cannot properly recognize them. Its output will include all 
or none of them. Hence, one of the primary methods to 
raise the storage capacity and addressability of the model 
is to alter the weight W,; to eliminate the possibility that 
more than one stored pattern has the same inner product 
with the input pattern. In the bipolar WTA model, W,, = 
X/, X? is either +1 or -1. Compared to the unipolar wTtA 
model (W,, = V/, V? is either | or 0), it eliminates the 
problem where the system cannot accurately distinguish 
between two patterns when one pattern is contained in 
another. Therefore, the bipolar WTA model has larger 
storage capacity and better content addressability. 


2. Bipolar Neuron and Interconnect Weight in WTA 
Model 


The stored pattern in this work is a two-dimensional 
vector. The neuron in the hidden layer (hl) and the 
neuron in the input layer (jk) are connected by a four- 
dimensional tensor Wyn Xj" and Xj, represent the 
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bipolar form of the hl stored pattern and the jk compo- 
nent of the input pattern, respectively. Let the intercon- 
nect Win, be X;,"'. Then, the input to neuron hi of the 
hidden layer yp ,, is 


‘ a 
11.=I WiaaXi= So DoX eX), (1) 


where h = 1, 2,...H and|=1, 2, ...L represent the position 
of a stored pattern in the H x L matrix, H x L= M is the 
number of stored patterns. k = 1, 2, ...K,j = 1, 2, ...J and 
Jx K=N represent that a stored pattern is a J x K matrix 
and N is the number of neurons in the stored pattern. 


Any bipolar pattern (X;,) can be expressed in terms of its 
unipolar format (V;,): 


Xin = 2V ix e l (2) 


Substituting equation (2) into (1), we have 


} 
Le = DS DAV -VAV irr 
k-ljel 
Kk } l ' 
=4 OD VirVin- > Vik 


keljel 


1 


—aVindtN (3) 


The relation between a pattern (V,,") and its corre- 
sponding reverse pattern (V,,") can be expressed as 
follows: 


Vj ti =1-7,2! (4) 
Dey, ae 
Hence, D> )Vie =N— RIV (5) 
yeikel elke 


From equation (3), we get 


£ J al l sz Al 
Br =4 > Vit Vieta Vir 


keijel 


— SV )=-N (6) 


As far as a network with a given number of neurons and 
number of input-patterns (V;,) is concerned, 


) _K 
C=N+23>>0>5 Vir 


felkel 


(7) 


is a constant. Hence, equation (6) 
can be written as: 


K } 
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eilj2] 








JPRS-CST-93-004 
3 March 1993 


We already know that each neuron in the hidden layer 
corresponds to a stored pattern and only the output of 
the neuron of the largest input value is not zero. Obvi- 
ously, the output of the neuron in the hidden layer is only 
dependent upon the relative value of p ,, and is indepen- 
dent of its absolute value. Hence, the constant C and the 
factor 4 in equation (8) can be omitted. Thus, p ,, can be 
simplified as follows: 


} 
SV EVat tz) 9) 


4,,= 
"tf 2 

Based oi: equation (9), this bipolar WTA model may be 
implemented by connecting unipolar patterns. The input 
to the hidden layer of the bipolar neural network is 
comprised of two parts. One is the input to the corre- 
sponding unipolar neural network and the other is one- 
half of the sum of all components of the reversed 
unipolar stored pattern. The second part can be imple- 
mented by adding a translucent plate of identical 
arrangement to the stored pattern (as shown in Figure 2) 
next to the input pattern and by incorporating a reversed 
unipolar pattern in the interconnection (as shown in 
Figure 3). Cutting the transmittance to '2 is implemented 
by dividing a neuron into two parts; one has a transmit- 
tance of | and the other 0. Thus, the difficulty associated 
with the accurate control of '2 transmittance is avoided. 


+i 


Input Pattern "+" 


Hi: Hs Ie 
fe: == tis 
ro Was Ai 
Stored and Reversed Stored 


Patterns of Interconnected 
Holographic Recording 


Figure 2. 





Figure 3. 





The interconnect T,,,, of neurons in the hidden layer is 
implemented through the use of an electronic network, 


as shown in Figure 4. It is dependent upon the maximum 
input of the neuron. In the iteration process, the neuron 
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with the highest input is excited and provides an output 
of 1. Others are inhibited and provide an output 0. 











Figure 4. WTA Electronic Network 





When the interconnection between the hidden layer and 
the neuron in the output layer, W’,41, 18 equal to Wixni, 
the system proceeds with self-association. Otherwise, it 
carries Out mutual association. When W’,,,,, is simulta- 
neously chosen to be equal to and not equal to W,,,,, the 
system performs self-association and mutual association 
at the same time. 


II]. Bipolar Multi-Channel Interconnection Holography 


Because of advantages such as high storage capacity, 
solid space distribution and distributive storage of mul- 
tiple three-dimensional objects, memory holography is 
widely used in optical neural networks. '*'® These usu- 
ally play the role of interconnection in an optical neural 
network. Figure 5 shows the optical configuration of the 
interconnected bipolar holographic system. P, is the 
input plane, and all the stored and reversed stored 
patterns are paired on P,. Various stored patterns are 
separated in space. P, is the plane where a holographic 
interference plate is placed to record the interconnec- 
tion. The distances between lens L, and P, and P, are d, 
and d,, respectively. Furthermore, the following image 
formation relation is satisfied: 


4- 


I/d, + I/d, = I/f (10) 


where f is the focal length of L,. 
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Figure 5. Optical Configuration of Interconnected 
Holography 
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Let us assume that the continuous format of the hith stored pattern pair is (V,,,(j-a,,, k-a,) + V,,,(j-a,, K-a,)), i.e., Wail ap, 
k-a,) is the continuous format for the hith stored pattern (V",,) and its coordinate on the input plane is (a, a)). VniG-ap, 
k-a,) is its reversed pattern and it is located at (a, a) on Pp. Then, the complex optical field distribution on P, is 


O,,(@,B)=c{[V,,(a—a,, B—2,)+7,,(a—a,, B—a,)] 
*h.,(a,8)}T,(a, 8) h(a, B) 


=cLV,,(—-2a+a,), —-2-(6+2,))+ V,,(——(a+a,), ——4 B+ a,)) 
2 2 2 2 
ikd; ¢o2 
exp| fd, (a?+ ) (11) 
Here, hg, and hg, are the pulse response functions of d, A convergent spherical wave is used as the reference. Its 


and d, with reference to free space, respectively. T (x,y) complex optical field distribution on P, is: 
represents the phase transformation introduced by lens 


L,. * represents convolution and c is an exact function. R(a, 8) =exp(— ESC (a— hy)’ + 89) (13) 
Then, the complex optical field of the input object on The total complex optical field on 
plane P, is plane P» is 
H L 
0 , ’ = , ’ 
0(a,f)= =2310,,(a, 8) (12) CG, B) + RCA, B) = 375704, (05) + RCA, B) 
(14) 





The optical field is recorded holographically. After processing the exposed plate, the part of amplitude transmittance 
that is related to first-order diffraction is 


t(a,p) =0*(a,8)R(a, 8) 
=¢ >> [V.,(- -Hata,), - oa 4: (B+a,)) 


elLel 


+Vai (—-4 (043), ---(8+4,))1 


exp(- fe L(a— ~h)? +#]— Se ikdy ¢q24 gr} (15) 


The stored patterns in this work contain nine Chinese characters; their holograms are shown in Figure 3. 


IV. Experimental Results 


Figure 6 shows an optoelectronic WTA pattern-recognition system. P, is its input plane. A grating is placed at P, 
behind the imaging lens L,. P; is the interconnected hologram. P, is the output plane of the holographic plate. V’(j,k) 
is the input to the input plane. It consists of a unipolar input pattern and a translucent pattern, as shown in Figure 2. 


Then, the complex optical field distribution on plane P, is 


E(a,8)={(V’ (a, B) # hy,(@,8)]T,(a, B)T, (a, B)} 
h(a, 8)t(a, 
= ¢ SS texp(— th od 44 [(a—hy)?+ BJ} x 
0 


hellel 
{LV (—d,(@/d,+0,, —d,) (B+4,/d))Vai(—d,(@+4,,/d,) —d,(B+0,/d,)) 
explikd,(a,@+4,8)/d,+1/2exp Likd,(a, @+ a,B)/d,.]x 


V:(—d,(G+a,/d,), —d,(B+a,/d;))]} (16) 
T ¢(@, 8) =1/401+Sgn(cos(wa)) (1 +Sgn(w8))] (17) 


where 
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is the transmittance of the grating, w = 2m a/ A d,, and a is the distance between various stored patterns. The complex 
optical field distribution at output plane P, of the connected associative storage device is 


E’ (0,9) =E CS, 9) * haolSs 2) 


d; a 





= ¢ $15 5[(dadbexp Cik(—Sr-a; + 25% )¢ +(e i+ r 3 


4elie) 


(V(—d,a/d,+a,, —d,B/d,+0,) x 
Vas (—d,Q/d,+0,, —d,B/d,+2,) 


4+ 1 /97. (—dia/do+ay. —d,B/do+a,)] 


Obviously, the bipolar inner product of input pattern 
V(j,k) and output pattern V,,(j,k) can be obtained on 
plane P,. It is located at (hy + dod,a’,/d2, dod,a’)), where 
a’, aad (a, + a,,)/2 = ha, a’) > (a, + a,)/2 = la. An 
optoelectronic triode is used to convert the intensity of 
the inner product to an electrical signal and this signal is 
transmitted to the WTA circuit in the hidden layer. By 
means of the interconnection effect, the winner takes all. 
Only the neuron with the largest input provides a high- 
voltage output. Others send a low-voltage output. This 
high-voltage signal is used to control a LED to illuminate 
the corresponding self-association or mutual association 
to thus provide associative recognition. Figure 7 [photo- 
graph not reproduced] shows the results. The use of 
bipolar connection and bipolar neuron input overcomes 
the problem wherein accurate identification is not pos- 
sible with a unipolar WTA system when a pattern 
contains another pattern. The storage capacity and 
addressability of the system have also been improved. 


(goa) (s.9) 

(5,4) (s, (e.f) = hae 

bw i “ i—Inet =| 
6 ‘ Iwork 


7 Le "Ps ‘ 
Figure 6. WTA Pattern Recognition System 
































V. Conclusions 


Both theoretical analysis and experimental results illus- 
trate that a bipolar WTA model has a larger storage 
capacity and higher addressability. It not only is capable 
of performing self-association but also mutual associa- 
tion. In addition, its illumination and gray scale are 
invariant. This is because the output of the neurons in 
the hidden layer is only dependent upon the relative, but 
not the absolute, value. If a pre-processor is placed in 
front of the system and the reference pattern is replaced 
by a corresponding invariant characteristic, then the 
system may be invariant with reference to inner and 
outer rotation, scale, and displacement. 


The storage capacity of the system is primarily limited 
by the fabrication technique, such as the inhomogeneity 
of the multi-channel inner product holographic plate and 
optoelectronic devices. 


(18) 
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[Text] 


Abstract 


The Fourier descriptor method is extended to produce a 
set of invariants that are independent of the affine 
transformation. These invariants are used to train a 
triple-layer perceptron network for the identification 
and classification of aircraft. An accelerated learning 
algorithm is adopted to significantly reduce the learning 
time. Finally, the results of using such a neural network 
for identification and classification of aircraft and spec- 
ification of noise tolerance are also presented. 


Key words: affine transformation, invariant, neural net- 
work, backpropagation, classification. 


I. Introduction 


Identification of a two-dimensional pattern is an impor- 
tant subject. Usually, this type of problem is done by 
similarity transformation, i.e., translation, rotation, 
magnification or reduction. Intuitively, the shape 
remains invariant. When a plane pattern (e.g., an air- 
craft) is observed at different viewing angles, its shape 
cannot remain unchanged as a result of affine transfor- 
mation. To this end, K. Arbter' proposed identifying the 
pattern by finding a set of invariants through Fourier 
transformation. This set of invariants is given in the 
form of complex numbers. In this work, a new method is 
presented to find these invariants. This set of invariants 
is expressed in terms of real numbers. It does not take 
phase into consideration in performing Fourier transfor- 
mation. These invariants can be used to train a three- 
layer perceptron to identify and classify aircraft. One of 
the major problems in using a neural network is the slow 
training process. To this end, an accelerated learning 
algorithm is introduced to significantly speed up the 
learning process. Finally, the noise tolerance of this 
neural network classifier is analyzed. 


II. Brief Introduction to Affine Transformation 


Any three-dimensional motion of a rigid body can be 
treated as a rotation about an axis that passes through 
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the origin and translation along the three axes.? The X’ = x'/z’, Y’ = y'/z’, X = x/z, Y = y/z (2) 
equation of motion can be expressed as follows: If the aircraft is far away from the image plane and varies 
Px! Pe) fax very slightly in the z direction, ie., z’ z, then the 
transformation on the projection plane can be expressed 
y’ I= RI y I+] dy (1) as follows: 


lL) GG) 


where (dx, dy, dz) is the translation, R is a 3 x 3 rotation a 
Or, in vector format, it is: 


matrix, and (x, y, z)' and (x’, y’, z’)’ represent the 


coordinates of a point on the body before and after the W’'=AW+B (4) 
movement, respectively. Let us assume that the plane of Where A is a 2 x 2 matrix and det(a) not equal to 0, and 
projection is z = |, then their projections are: W’, W and B are two-dimensional column vectors. This 


is affine transformation. 


We know that if matrix A can be expressed as A = aU, where a is a constant and U is an orthogonal matrix. then 
equation (4) becomes a similarity transformation. Affine transformation does not place any requirement on A. 
Therefore, similarity transformation is a part of affine transformation. Affine transformation contains similarity 
transformation. Figure | shows a few examples of such transformations. 


r ry. 


Reference Simi- Affine Non-affine 
pattern larity transfor- transformation 
Figure 1. Explanation of Affine Transformation 








III. Determination of Invariants 


The key to performing a Fourier transformation is to where c is the silhouette of the plane pattern and proved 
locate a parameter that varies linearly with respect to an _‘ that the following conditions exist when B = 0 in equa- 
arbitrary affine transformation. Arbter'? provided a __ tion (4). 

parameter t: 


(1) The parameter varies linearly under affine transfor- 
1 mation, i.e., if t’ is the parameter corresponding to the 
=+) (xd¥ ~ Idx), (6) pattern to be identified and t is the parameter for the 

2Je reference pattern, then constants c and d exist to satisfy 
t’= c(t + d). 





(2) This parameter is independent of the selection of the 


initial point on the curve. 


When B # 0, the origin of the coordi- 
nate system can be moved to B. 


B= wasdy/ Garay and its accuracy 


can be proved as follows. If the 


reference pattern satisfies 


ward / dxdy¥=0, and a transfor- 
D D 


mation W' = AW + B is made to change 
the enclosed area from D to D', then 


B/= f w’axay/G ddy= 


f cAw+ Byazay/ Gi axdy=B. 


Theorem 1. The translation in an 
affine translation can be determined 


by 
B= G wazay/ G axay 


If an arbitrary point on the edge is 
expressed as a vector function 


w=[- 
y(t) 
transformation with respect to x(t) 
and y(t), a matrix of coefficients 
expressed in terms of Fourier series 


is obtained. 


(6) 


|; after performing Fourier 


en 


In equation (4), when B = 0, the discussion can be carried 
Out in two cases. 


1) W(t) = AW(t) and Fourier transformation is done 
with respect to both sides: 


[*’ + on _ [xt xi) 
Yi/+iyd/d “lys+iys 
where X,," and X,,' represent the real and imaginary part 


of the nth-order coefficient X,,. Based on the definition 
of equal complex numbers, we have 
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ye J=4L): } ve Jealy] 
Therefore, Ys! yi =|A| ef! 














2) W(t,) = W(t) and t, = C(t + d). After performing 
Fourier transformation with respect to both sides, we 
have 


bon tT ee jae 


atix’ 
Yee | 


Yi+iy! 


where b is a constant. 


Let Xe=|X_\e'**, Ye=lY.[e'*’, 











then, 

. xi cos, sind, 
ee [Xe] Ye! a 
y, Y cosy sin dy 


|X.| lY.| sin (dy— $2) 


xs’ xi? 
Yi vv 








cos (bs+Mb) sin (s+) | _ 


IXe11¥e| cos (py +b) sin (by +7b)! 





|X.| lY.| sin (¢,—¢,) 


Combining these two cases, we get: 
Theorem 2. For an arbitrary affine transformation 
Xs Xi 


E,= 
Ys, Ys 








is a set of invariants. 


Figure 2 shows six aircraft models to be identified. 
Figure 3 shows eight different configurations of model D 
by means of affine transformation. Table | lists the 
invariants determined by the method described in this 
work. 
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Table 1. Invariants Determined Corresponding to Aircraft Shown in Figure 3 
A -2.11 0.385 -2.31 0.338 0.520 -.813 0.401 0.171 0.163 0.312 
8 -2.13 0.90 -1.84 0.293 0.121 -.792 0.132 0.159 0.332 0.107 
Cc -2.12 0.361 -2.30 0.363 0.499 -.843 0.356 0.155 0.228 0.293 
D -2.25 0.23 -2.26 0.325 0.574 -.927 0.188 0.254 0.347 0.206 
E -2.19 0.241 -2.30 0.366 0.562 -.877 0.283 0.220 0.262 0.251 
F -2.22 0.245 -2.29 0.358 0.546 -.886 0.302 0.225 0.276 0.265 
G -2.15 0.306 -2.30 0.355 0.529 -.846 0.346 0.194 0.234 0.277 
H -2.20 0.165 -2.29 0.342 0.531 -.910 0.24) 0.201 0.317 0.224 
_ dE, 
AW,,;=-1 aw, (9) 


hA 4 
tAt 
Ak 
a ea 


Configurations of Model D 
Obtained by Affine 


Figure 2. 


Figure 3. 





IV. Neural Network Classifier 


Several neural network models are used as classifiers. 
The most commonly used network is a multilayer per- 
ceptron. A multilayer perceptron is usually trained by 
backpropagation (BP). Let us assume that W,, represents 
the connection weight between neuron i and neuron j in 
the subsequent layer. If d, and y, represent the ideal and 
actual output of an output node, respectively, then the 
overall error caused by a specific pattern can be 
expressed as follows: 


E,= 3564) —9,)' (8) 


The weight correction process in the conventional BP 
method can be described as follows: 


where n is the learning rate. Weight correction is an 
iterative process up until the convergence condition 
absolute value of: A W,, < c absolute value of: Wi is met. 
As for the selection of nN, usually a certain value 1 is given 
before the learning process begins. If the converging 
process is slow, then n is gradually increased. If n is 
found to be too large, even causing oscillation, then n is 
gradually reduced. There are no specific methods to raise 
or lower n. Instead, one relies on experience. Usually n is 
chosen to be a constant. In this work, n = 0.01. We found 
that when the algorithm approaches its steady state, its 
convergence rate becomes very slow; this is because the 
weight correction method described earlier, i.e., the 
fastest fall method, is decreasing linearly. We also know 
that the speed of the Newton method that corrects the 
weight according to 


AW 1) =—0E+/ S(Fe ww) 


decreases by the square power. However, its disadvan- 
tage is that each step might be so wide as to cause 
oscillation. Usually, the scheme is to use the fastest fall 
method to adjust it to near its steady state and then use 
the Newton method to correct the weight. The number of 
iterations can be drastically reduced. It is of great signif- 
icance to choose the right n in order to minimize the 
number of iterations required. A method has been devel- 
oped to dynamically calculate n.* Assume 


aE /( sy 


then n can be determined as follows: 


(10) 


if a not equal to 0, then n = a if a < 20; 20 otherwise, 


ifa = 0, then n = 0. 


After implementing this dynamic calculation of n in this 
work, the number of iterations was reduced from 900 to 
approximately 60. 
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V. Experimental Results 


In this work, a three-layer perceptron is chosen as the 
neural network classifier. This three-layer perceptron 
has 15 input nodes and 6 output nodes. The number of 
nodes in the hidden layer is adjustable. We picked a 
range of 10 to 50. For the six aircraft shown in Figure 
2, 15 different configurations are produced for each 
model by affine transformation. A total of 90 input 
samples are used to train this three-layer perceptron. In 
this work, the noise tolerance of this multilayer percep- 
tron was also investigated. For each aircraft model, a 
group of noise-superimposed images was produced to 
test the accuracy rate of this classifier. Assuming that 
each edge point can move arbitrarily within an L x L 
window centered around this point, the noise rises as L 
increases. For instance, Figure 4, from left to right, 
shows the configurations as the noise level increases 
from | to 5. At every noise level L, 15 test patterns are 
generated for each aircraft. A total of 90 noise- 
superimposed images are used to test the accuracy of 
the network for identification of the aircraft. Figure 5 
shows the error rate as a function of noise level L. It 
was found that even the network trained with noise- 
free specimens must also have a high resistance against 
noise. The reason is that it plays an important role in 
determining some of the invariants and a secondary 
role in the remaining invariants. As the noise level 
rises, the impact on the primary invariants is far less 
than that on the secondary ones. The three-layer per- 
ceptron can depend on the primary invariant to per- 
form classification. Finally, noise-superimposed 
images were used to train this three-layer network and 
we found that it had a better noise tolerance. 


web ak 
awd Ate 


Noise-Superimposed Models 
(Noise Level 1 to 5 from 
left to right) 


Figure 4. 








Erroneous 
recognition 
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Figure 5. 
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VI. Conclusions 


An aircraft identification method involving affine trans- 
formation is presented. The mode recognition process is 
completed by BP network. This method is applicable to 
any plane pattern with a continuous and smooth border. 
The accelerated algorithm introduced can drastically 
reduce the computing time. It is general in nature and can 
be extended to many neural network learning processes. 
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[Text] Abstract 


This paper presents a dynamic-programming-based time 
normalization method to implement a DTW neural 
network. DTW (dynamic time warping) is one of the 
most effective methods for spoken word recognition. It is 
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very robust and provides the highest recognition rate 
possible. However, it takes too much computing time. 
Unless with special hardware, its implementation is 
limited by time. In this work, all the computation is 
carried out by two recurrent subnets and a memory 
layer. This method demonstrates the superiority of the 
hard-wired architecture. It offers a new strategy for the 
implementation of DTW by hardware. 


Key words: neural network, speech processing, pattern 
recognition. 


I. Introduction 


It is well known that because voice signals are highly 
random, speech patterns are often nonlinearly distorted 
along the time axis. How to resolve this problem, i.e., the 
time normalization problem, is an important issue in 
speech recognition. DTW is a nonlinear time normaliza- 
tion algorithm. Through the normalization of two input 
voice signals, their acoustic similarity can be maximized 
and their cumulative distance is minimized. 


When using DTW to recognize an isolated voice signal, 
the unknown templates are compared one by one with 
the reference templates until a reference template with 
the least cumulative distance is found and this template 
is the result. 


As we know, an effective algorithm can produce satisfac- 
tory results with the appropriate hardware. Recent 
studies*? show that the neural network is an important 
technique to accomplish optimization. In this paper, we 
will demonstrate the use of a neural network to solve the 
dynamic normalization problem. Furthermore, an archi- 
tecture is provided for the implementation of this neural 
network with optics and semiconductors. 


The second part of this paper briefly discusses the DTW 
algorithm. The third part shows a neural network archi- 
tecture to implement this algorithm. Finally, the efficacy 
of this method is verified with a real example. 


II. Dynamic Time Normalization Algorithm 


Speech can be expressed in terms of proper parameters. 
We use Eigen vectors to express voice signals: 


A = a), a, eee ai, eee a), 


Now, let us consider how to eliminate the time difference 
between these two speech patterns. Consider the I-J 
plane shown in Figure |. The I axis and J axis represent 
the unknown and reference pattern, respectively. The 
time difference between these two patterns may be 
expressed by a point sequence c = (i, j): 


F = c(1), c(2), ..., c(k) (2) 


where 


c(k) = (i(k), j(k)) (3) 
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Figure 1. Limiting Window for Normalization on I-J 
Plane and Normalization Function 





B = b;, b3, eee b;, eee b; (1) 


F is known as the normalization function. It is used to 
complete pattern matching from A to B. When there is 
no time difference between the two input speech pat- 
terns, the normalization function is the straight diagonal 
line i = j. As the time difference increases, the normal- 
ization function deviates farther from the straight line. 
In order to determine the difference between Eigen 
vector a; and b,, let us define local distance (LD) as 
follows: 


d(c) = d(i, j)=| 1a, - bj | (4) 


The weighted sum of LD is defined as the general 
distance (GD) 


E(F)=Sa(et)) Wh) (5) 


koi 


where W(k) is the weight coefficient which in conjunc- 
tion with the slope constraint' controls the normaliza- 
tion function. When F is determined, E(F) is minimized. 
Correspondingly, the time difference between two 
matching specimens is adjusted. The time-normalized 
distance between patterns A and B is defined as follows: 


aan 





Sid(eck)) WK) 
D(A, B)=min +++ (6) 
DoW (A) | 


a=) 





K 
where S°W(k) in the denominator is 
kel 


used to compensate for the effect of 
the weight on F,. 





Based on unique features associated with speech, a 
number of constraints such as monotony and continuous 
boundary of variation of speech parameters' are 
imposed on the normalization function. Usually, the 
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search area of the normalization function is limited to a 
window on the I-J plane, called the limiting window, as 
shown in Figure |. 


During the minimization process for D(A, B), the GD for 
every point shown in Figure | is determined. They may 
become a point in the point sequence of the normaliza- 
tion function F. The following iteration equation is used 
to optimize D(A, B). 


&(¢(k)) = min[8(o(k— 1)) +d(e(k) Wk] 
(7) 


In Figure |, from left to right and from bottom to top, 
the partial general distance (PGD) is calculated step by 
step. In the equation, c(k - |) includes all matching points 
in the previous step. We will discuss how to implement 
this process with two recurrent neural subnets in the 
following. In this case, every point is linked by code. For 
the problem under investigation in this work, equation 
(7) may be written as: 


Bi, 1-1) +d, )) 
B(i, j)=min| RCi—-1, f—1)4+2dCi, 7) 
BCi-1, J) +dCi, 7) 


(8) 


Here, 2 W(k) = 1 + J. When the computation reaches the 
last point (I, J), D(A, B) can be optimized and the value 
is g(1, J)/(1 + J). 


If an optimization strategy table is used to record the 
local optimization path employed to derive the current 
PGD, then the normalization function F may search in 
the reverse direction. 


III. Neural-Network-Guided Computing Architecture 


In the optimization process, a recurrent chain table is 
created for all states associated with axes I and J. Each 
state is coded with a binary state vector Y in a recurrent 
network, i.e., the so-called network layer I and network 
layer J of the guiding network. For I and J, they are 
coded with M, and M, neurons in the form of (unor- 
dered set: -1, 1). Furthermore, it satisfies 2“1 [1 is 
subscript for M] > I, 22 [2 is subscript for M] > J. In 
Figure |, to make it convenient to define the position of 
a point, I and J are coded and the two codes are 
combined into a state vector Y. Since the number of 
neurons grows logarithmically with respect to I and J, the 
scale of the network is suited for the matching of a large 
number of samples. 


A state is used to describe a point with reference to 
guiding layer network I and J. A converter is used to 
match the binary state vector with the corresponding 
PGD and BTP (backward tracking pointer). This con- 
verter is implemented by the help of a forward network 
layer called the memory layer. 
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When the optimization process is completed, if neces- 
sary, it is possible to construct the optimal normalization 
path F by using the associative memory BTP. The 
tracking begins from the last node; the normalization 
function F may be reconstructed by connecting the nodes 
pointed out by BTP neurons. 


1. Guiding Layer 


This layer contains two recurrent neural subnets com- 
prised of M, and M, fully connected high-level neurons,” 
respectively. We have R = M, + M3. The state of the 
neuron in this layer is expressed as S = unordered set: Y', 
Y?, ..., ¥?R [R is superscript for 2], where Y  (Y,, Y>, 
...» Y) € unordered set: -1, 1". Assuming y and y’ are the 
past and present state, then the recurrent process can be 
described as follows: 


()=WN)= SW (9) 


a-e-e 


(9) 


The summation covers the entire M bits of the binary 
series a = Gj ;...dyq unordered set: 0, 1M. We use W, = 
(W,,0°, W,%! Wh!!!) to express the 2” dimen- 
sional higher-order weight vector for neuron h. Then, by 
using 1) ™ (Nooo Noo.1> «+ Nis.) © unordered set: -1, 
17M [M is superscript for 2] and 


Na(Y) =T]),% 


4-1 


(10) 


the state vector y may be expanded into a 2" dimen- 
sional state vector. The hth bit of a in the equation 
provides the power of y,. From the above, the 
architecture’? provided by equations (9) and (10) is 
sufficient to produce any reflection of y’: unordered set: 
-1, 1“ — unordered set: -1, 1“. The learning process of 
the recurrent weight ts: 


MM 
Wate S90! (71094) C11) 


aol 


The state evolves in the recurrent subnets to create the 
node sequence shown in the chain table. In order to 
establish such two recurrent subnets, consider the fol- 
lowing finite sequence y' — y? — ... — y*"; k, € 
unordered set: |, ..., 2M. As discussed in the previous 
section, all neuron states are linked from bottom to top 
along the I axis and from left to right along the J axis 
when the system is initialized. Based on equation (9), a 
new state is totally determined by the state at the 
preceding moment. In addition, it is defined that y' 
immediately follows y7M [M is superscript for 2] to form 
a recurrent chain. 


In this network, to increase or decrease the number of 
states is a simple operation. If the present state y™ is to 
be deleted, it can be accomplished by breaking the links 
established by equation (11) that connect the present 
state to the previous state and the present state to the 
future state. 
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the chain can be reconnected. 


4W,= —e-(aet ean) +9,tinyt?-4)) 
(12) 


Then, according to the following 
equation 


y,*70in(y*s-1) (13) 





l 
AW, = ou 


This simple delete operation makes it easy to reconnect 
the chain in this network. When a new state is added to 
the chain table, it is implemented by using equations (1 2) 
and (13). The difference is that equation (13) must 
connect the previous state to the new state and the new 
State to the future state. 


2. Memory Layer 


In order to perform digital computing and BTP 
recording, a forward network is introduced to corre- 
spond all states in the first recurrent subnet with all 
interconnected LD, PGD and BTP. It is called the 
memory layer. 


This network operates in a manner similar to the first 
layer. The system R has a binary input, but only outputs 
two real numbers; one is LD which later upgrades to 
PGD and the other is BTP. Therefore, the layer has two 
weight vectors, W, and W,,.. The learning process for the 
reflection of the state vector onto LD follows the same 
manner as expressed by equation (11). 


2 
Wiad Sun (14) 


In the DTW treatment in this work, every node in the 
chain table has three local paths to select. After choosing 
the optimal local path based on equation (8), the min- 
imum PGD is obtained. In addition, the previous point 
before the optimal path is denoted for backward search. 
This can be accomplished by using PGD to upgrade the 
LD neuron. This upgrade means that the weight W, is 
updated according to the following: 


W =e Pedcy) — 1409") yn") 


Furthermore, the BTP neuron is denoted by unordered 
set: -1, 0, | and each value represents a possible local 
path. We have 


(15) 


1 2* 
We= 98 ub tP(y*)acy*) 
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The BTP neuron may be initialized by training the 
weight so that the output of every column of the I-J plane 
from bottom to top is -!. When a point is located outside 
the limiting window, its output is |. In other situations, 
its Output is 0. 


3. Auxiliary Control Unit 


Figure 2 shows the complete network architecture. In the 
local path processing module, some control units are 
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Figure 2. Flowchart of Network Control 





The comparator COM is used to make a determination 
on the result produced by equation (8). Corresponding to 
the three input values, the comparator has two different 
Outputs, the minimum g(i, j) and BTP marker. The 
Output of the comparator simultaneously trips the 
upgrade of the LD neuron and the recording of the BTP 
marker. 


An accumulator is placed in front of the comparator to 
generate the three possible PGD values for comparison. 


Tne addresser ADDR monitors and records the possible 
‘nitial addresses of local paths in subnets I and J. The 
unit is comprised of registers CUR to store the current 
state, and PRE|, PRE2 and PRE3 for storage of previous 
states (i, j-1), (i-1, j-1) and (i-1, j), respectively. For a 
given state (i, j), it looks up the address for state (i, j-1) by 
keeping subnet I invariant and evolving the J subnet J-1 
times. Similarly, the addressing of (i-1, j-1) and (i-1, j) is 
done in the same fashion. 


A number of triggers are employed in the system. They 
are added to the BTP neurons to guide the computing 
process. 


Trigger SIG reports the completion of the normalization 
process, i.e., the optimization of F, and triggers the 
backward search. 


Triggers IINC, JINC and PBLK control the operations in 
the evolution and processing modules of subnets I and J. 
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IV. Speech Recognition Experiment Using Isolated 
2 B. -n . “ . 


7 signals are converted to digital signals by a 22 
kHz A/D converter. The speech waveform is pre- 
processed with a partially overlapping Hamming 
window. The signal window is 22 milliseconds wide. 
Eigenvectors are separated after passing through a 20- 
channel narrow-band filter ye The speech databank 
is comprised of 10 single-syllable words of the numbers 
0-9 in Cantonese. The reference samples are provided by 
20 males and 20 females. The test speech samples are 
taken from five females and five males other than those 
providing the referencing samples. 


The Euler (?) distance between the two vectors is used to 
initialize the local distance LD of the network. The 
memory layer is initialized according to equation (14). 
The chain table of all the states is fnitialized prior to 

rforming matching in subnet I and J independently. 

or a given vector pair to be matched (A, B), according to 
equations (12, 13), states y'*', ..., y7M, [M, is superscript 
for 2] in net I are deleted from the subnet. State y’ is 
connected to y' to make the subnet recurrent. To deter- 
mine the position of y'' from y' is equivalent to have the 
subnet go through I-1 evolutions. Similarly, extraneous 
states in network J may be deleted accordingly. 


Starting from state (1, 1), based on ihe description in 
Section 3, the outputs of the BTP neurons are sequen- 
tially read by triggers IINC, JINC and PBLK to deter- 
mine the interconnection between states. Then, BTP is 
renewed to denote the optimal local path. In addition, 
the LD neuron is upgraded to the PGD value. When 
the BTP neuron output is 0 or |, trigger IINC trips 
network I to evolve. When the BTP output is -1, trigger 
JINC trips network J to evolve. When the BTP output 
is O, trigger PBLK trips the processing module into 
operation. When SIG indicates that the search is over, 
the general distance g(1, J) is compared to the min- 
imum general distance MGD that is already stored in a 
register. If it is less than the MGD, then the corre- 
sponding reference sample index (ID) is replaced by 
the new word. Once the sample to be identified is 
compared with all the reference samples, the recogni- 
tion process is completed. Figure 2 shows the control 
flowchart using neural-network-guided computation 
for speech recognition. 


The optimal normalization function F can be denoted 
using the BTP neurons. Working just as ADDR in the 
processing module, the optimal path can be backtracked 
step by step. It is then linked together until reaching the 
starting point (1, 1). 


It was found experimentally that the recognition rate for 
isolated words spoken by an unspecified person is as high 
as 99.0 percent. This is comparable to that of any 
conventional algorithm.‘ 
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V. Conclusions 


This experiment points out a way to implement a con- 
ventional DTW technique by using a hardware network. 
This finding is of critical significance. As we know, a 
hard-wired architecture will undoubtedly increase the 
computing speed to reach optimization. As for the neural 
network presented in this work, the LD may be com- 
puted in parallel before optimization. All these advan- 
tages, on top of the apparent superiority of the parallel 
architecture of the neural network, make the implemen- 
tation of DTW more effective. 
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[Text] 


Abstract 


This paper presents an approximate .%yic system. Not 
only the logic value of the system is fuzzy but also the 
logic operator can be fuzzy as well. This approximate 
logic system is very suitable for a neural network. A 
neural network model is established based on this con- 
cept. It is capable of learning and storing knowledge. On 
the basis of this neural net, a multi-expert opinion 
synthesizer is developed. 


Key words: approximate logic, and-or degree of oper- 
ator, expert opinion synthesis. 
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I. Introduction 


Neural networks have received widespread attention 
because of their ability to process inaccurate and fuzzy 
information. Nevertheless, the use of neural networks to 
process knowledge is very limited. In order to overcome 
this weakness, an approximate logic is defined. This 
logic can properly describe a neural network. It not only 
has a fuzzy logic value but also a fuzzy logic operator. 
Finally, a neural-network-based multi-expert opinion 
synthesis system is also described. It is capable of storing 
rules and can be trained. The conventional weight 
method is equivalent to a special case where the neural 
net does not have any knowledge in storage. After 
receiving opinions from various experts, it enters a 
pondering period and then makes the final judgment. 
This process is similar to that in the human mind and is 
worth further investigation. 


II. Approximate Logic and Neural Network 


1. Approximate Logic 


An approximate logic system is defined as follows. The 
logic value of the system is between [0, 1]. By varying 
some parameters, a function may be switched from “or” 
to “and,” or from “yes” to “no.”’ Different from fuzzy 
logic, not only its logic value is fuzzy but also its operator 
can be fuzzy. The following theorems are fairly easy to 
prove. Due to page limitation, the proof is omitted. 





Definition 1. Approximate Logic. The logic value of the system is between [0, TJ. A,, A,, ... A, are the approximate 
logic variables, tr; and w,, i = 1, ..., k, are constants and weights, respectively. The weight is between [-1, 1]: 


fek 


F=50"; * A; 


2] 


AF (A), Ay re%s Ax) = 


 F if OSFS1 and F>¢tr 
1 if 1<F and FDtr 





_0 otherwise 


It is easy to know that the inverse function AF is a multi-value function. 


Theorem 1. Assume that A, can only be either 0 or |. 


tr=], w,=l1, i=l, 7". k, then AF (Aj, “oe Ap) =A,V Ar''s VA, 
tr=k, w,=l, i=l, ee k, then AF (Aj, =e Ay) =A;/A Ar's AAk 


(1) if 

(2) if 

(3) if w=—1, w,=1, tr=1, then AF(A, I =7Ai 
(4) if w,=], tr=0, then AF(A) =A, 


Therefore, every Boolean function can be expressed in terms of AF. 
Proof: Omitted. 
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Definition 2. If the weight of a function with n variables 
belongs to [0, 1] and its threshold is 0 < tr < n, then the 
“and/or” operator is defined as follows: 


aod(AF) =tr/(S5w,) 
fe] 


When aod changes from 0 to 1, the function 


V72,A,; becomes A7.14,;. 


Through the use of aod, a fuzzy operation can be 
discretized into corresponding Boolean operations. 
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2. Neural Network Model 


Our neural network is comprised of four parts: a weight set, 
a nodal point set, a threshold set and an output set. w,, € [-1, 
|] is the weight from node i to j. Let us assume that w,, € [0, 
1}. $, is the excitation value of node i and out, is the output. 
Then, the system operates according to the following: 


< & 

Het D= Siw, © out, (t) +b, (t) © ext, (t) 
=i 

(1) 


where b,(t) is a coefficient in [0, 1]. 





out; (t) =| max 





Lmin otherwise 
In general, max = 1, min = 0, or max = |, min = -1. The 
net input of the system is defined as: 


not, (te Swi, @out,(t)+b(t) © ext, (t) (3) 
-: 
$,(t+1) net, (t) +w,, © out, (t) (4) 


Since all propositions are self-supporting w,, € [0, 1]. 


Definition 3. The Boolean system is a binary system and 
its output is either 0 or 1. In this case, max = 1, min = 0. 


Theorem 2. If a system has max = | and min = 0, and 
each weight belongs to [0, 1], then each node is equiva- 
lent to an approximate function. 


Proof: Omitted. 


Deduction 1. Every Boolean function can be imple- 
mented using a Boolean system. 


Proof: Omitted. 


Theorem 3. If net,(t) = p, where p is a constant, and min 
<p < max, then if p + w,, *out,(t) < tr,, then out,(t+k) 
= min, k = |, 2....; if + wy *out(t) > tr, wy = 1 and p 
> 0, then k > (max - tr,)/p and out,(t+k) = max. If w,, = 
1 and p < 0, then k > (max - tr,)/p and out,(t+k) = min. 
If p + w,, *out,(t) > tr,, and 0 < wy < 1 and p/(1 - wy) > 
tr,, then out,(t+k) — p/(1 - w,,). When kk — + 0, ifp + w;, 
*out(t) > tr, 0 < w, < 1 and p /(1 - w,,) < tr,, then 
out,(t+k) — min when k — + oo. 


Proof: Omitted. 


Conclusion. If net, is a constant, then the final out, is 
stable. 


Ps; (t) if 8,(t)>tr,; and 
if $,;(t)>max and 


min<ss,(t)<max 


S$, (t) ptr, (2) 


Definition 4. A neural network can be viewed as a directed 
graph, called a neural net graph. Each node is a node for 
the graph. Each non-zero weight is the edge of the graph. 


Theorem 4. In a neural network, if all the input ext, and 
their weights are constants, and if the neural network graph 
has no loop, then the system will oscillate in operation. 


Proof: Omitted. 


Definition §. The deduction process of the neural net- 
work is as follows: A node that does not receive infor- 
mation from other nodes is a conditional node. A 
conditional node is a premise for the system to accept a 
deduction. When the system operates for some time, if 
other nodes have been stabilized, then the state of these 
nodes is used to derive a conclusion under the premise. 
A looped neural network is usually a chaotic system and 
its qualitative stability analysis is very difficult to per- 
form. It usually oscillates because there are often contra- 
dictory bits of information. 


Definition 6. Linear Stability Condition. If O = (0,, 
0,...0,) represents the steady states of the neural net- 
work and min < 0, < max for all i, then O is known as 
linear stability. 


Theorem §. Let us assume that in a system the coefficient 
b(t) of equation (1) is zero, then the condition for linear 


stability is: 
Woo °° Wor | Oe | [Os | 
eee coo [wl : 


(5) 

















Wyre ad 





W re 


ail 


Proof: Omitted. 
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III. Neural-Network-Based Multi-Expert Opinion 





Figure 1. Layered Neural Network 





Synthesis System 
1. Purpose of Use 


Expert systems are being developed from a simple data- 
bank of a single expert to a complex system involving 
multiple experts. In a distributed multi-expert system, 
the key part is a subsystem that synthesizes the opinions 
of the experts. Conventional methods, including 
weighted method, voting method and statistical method, 
etc. have the following major disadvantages: 


(1) There is no mechanism to incorporate experience 
into the management of synthesis of expert opinions. 


(2) It is not possible to provide training. 


(3) It is impossible to reflect the situation wherein the 
experts’ opinions cannot be mediated. 


Since the ability to synthesize experts’ opinions is 
directly related to human intelligence, it is not possible 
to create an ideal model for the synthesis of experts’ 
opinions. We believe that if a model can be created, the 
model ought to have a number of tunable parameters to 
meet the needs in different disciplines. Furthermore, it 
Ought to be able to overcome the three major disadvan- 
tages described above, at least to some extent. This 
model will be helpful to our effort to improve the 
capability of the system to synthesize experts’ opinions. 
Asa result of our investigation, it was discovered that the 
competition and coordination mechanism of a neural 
network can perform these functions well. 


2. System Architecture 


A state cell is used to represent an expert’s judgment. A 
state cell is connected to a judgment. The value of a state 
is between [0, 1]. Each cell in the system has an input and 
ext;. Various opinions of the experts are sent into the 
system through these input ends, as shown in Figure 2. In 
equation (3), b(t) can be considered as the weight for the 
expert. Prior to operation, common knowledge of the 
experts is stored in the system. There are two ways to 
gather experts’ opinions. One is a parallel listening 
method. The method collects all the opinions of the 
experts and provides the result to the system after 


applying a weight. This method is very similar to thes 


conventional methods. Its advantage is simplicity. How- 
ever, its shortcoming is that it cannot reflect disagree- 
ment among experts in detail. It is only able to reflect the 
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situation wherein the collective opinion of the experts is 
consistent with the knowledge stored in the network. The 
other way is a serial method which listens to experts’ 
Opinions one at a time. After listening to the opinion of 
each expert, the neural network is allowed to complete 
several cycles of processing before listening to the next 
expert. The opinions of certain experts can also be 
listened to repeatec!!y. Ths process is very similar to that 
used by a human. In this serial mode, if w,, = 1, w = 0, 
i not equal to j, and tr, = min, the results provided by the 
system are the weighted values. The parameters of the 
system must be tuned in practice. The neural network is 
usually unstable. In other words, it is impossible to 
inquire about the result after the system has calmed 
down completely. 


In order to solve this problem, a waiting period number 
T and an observation period number L are specified for 
the system. After the system listens to the opinion of the 
last expert, it is allowed to operate for T cycles. This is 
equivalent to allowing the system to think based on the 
knowledge it stores. Then, the system is observed over L 
cycles. During this period, the state of every cell is 
analyzed. The synthesis result is obtained by finding the 
mean value of these observed values. It is easy to 
determine whether a cell is oscillating. As long as the 
observation period is equal to the oscillation period, a 
mean value of (max + min)/2 indicates that no judgment 
can be made. 






; 





/ / 


|Experts' opinions} 








Figure 2. Multi-Expert Opinion Synthesis System 





3. Results of Operation 


Definition 7. If S, =(s,, S>, ..., §,) represents the result and 
Z, = (Z;, Zp, ..-, Zy) iS the experts’ opinion, the degree of 
acceptance of experts’ opinion is defined as follows: 


ik 
err= > )(2;—3;)' 


am | 


4 ; ; a ; 
Based on simulation, the following interesting phe- 
nomena have been discovered: 


k 


(1)'Self-contradictory opinions produce a no decision or 
oscillation. 


If contradictory knowledge A — , ..., — > A is stored 
into the system, the system oscillates. 


(2) Winning with masses. 
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When a number of experts participate in a debate, if the 
Opinion of the majority is consistent, the output of the 
system is in agreement with the majority opinion. Of 
course, the majority opinion must be consistent with the 
knowledge learned by the system. In Example |, wo, = 
0.9, w,.=-0.8, w3, = 0.8 and w,, = 0.9. Other weights are 
0 and all thresholds are -3. 


From Tables | and 2, we can see that the opinion of 
expert Z, is different from that of others. The final 
operating result is also different. 


(3) Winning with few. 
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Since the system “ponders” for several cycles after 
listening to the opinions of all experts, during this 
period, the system forgets the opinions of certain experts 
that are incompatible with the knowledge stored in it. 
Therefore, the system sometimes reaches the conclusion 
that is consistent with the minority, as in Example 2. 


In this system, Wo, = 0.9, w,, = -0.8, w;. = 0.8 and w;, = 
0.9. Other weights are 0 and all thresholds are -3. 


From Tables 3 and 4 we can see that the viewpoint of 
expert Z, is different from that of others, but is consis- 
tent with the knowledge stored in the system. The final 
result is in agreement with his opinion. 



















































































Table 1. Results of Example 1 Table 2. Opinions of 
Experts in 
cycle 0 . ne, ee) ee) er | | Example 1 
z Zz zZ ’ z Z 
expert Z, ’ 1 i 2 Z z, . ’ node A, <A, = A, A, ere 
; 0. ' . . . ' 0.30 0.30 
bt) 0.60 60 0.60 0.60 0.40 0.40 0.30 0 0 20 0.65 0.80 0.78 -0.85 2.08 
4 .00 -0.3 0. ° . . + 21 2 
node} 0 ~ 40 0.40 0.286 0.28 0.21 O 0.21 21 0.80 0.68 -0.78 0.00 0.88 
de2 0.00 -0.22 1. 1.00 0.74 0.7 68 0. 58 
— ed a ne oo ow Z2 0.70 0.60 -0.75 0.80 0.68 
de3 0.00 0.82 -0.82 -0.82 -C.64 -0.64 -0.40 -0.40 -0.4 
_ ee ae Se Soe Se Se Z3 0.70 0.80 -0.49 0.00 0.68 
coded 0.00 -0.30 0.45 0.45 0.82 0.82 0.27 0.27 0.27 
Table 3. Results of Example 2 Table 4. Opinions of 
Experts in 
cycle 0 8 ti 18 23 26 20 3) Example 2 
t Z Zz z zZ z z z Zz 
exper ® ‘ 1 ' ' ’ . ‘ node A, A, A, Ay ere 
b(t) 0.60 0.60 0.60 0.40 0.40 0.80 0.30 0.90 
Z, 0.60 -0.70 -0.90 0.60 1.35 
noced 0.00 0.36 0.47 0.34 0.34 0.30 0.30 0.30 
Z, 0.05 -0.50 0.50 0.01 2.10 
node] 0.00 0.23 0.80 0.83 0.93 0.37 0.37 0.37 
Z, 0.85 0.70 -0.35 0.05 0.87 
rote2 0.00 -0.07 0.14 -0.58 -0.58 -0.22 -0.22 -0.22 
Z, 0.00 -0.50 -0.50 0.05 1.76 
node3 0.00 0.36 0.46 0.38 0.38 0.28 0.28 0.28 
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4. Learning From Samples 


As described earlier, the system architecture includes the 
interconnection among various determinations. These 
interconnections need to be corrected through a learning 
process in operation. The learning process was com- 
pleted by adopting the Hebb or Delt rule. If the opinion 
of a certain expert is extremely important, it is repeat- 
edly used to stimulate the system until the system 
accepts his opinion. 


Assuming X = (X,, X,...%,) is the opinion of that expert, 
then the weight from the ith to the jth cell becomes: 


Hebb rule: 


middle = (max +min) /2; 
OW, ;(t+1) =e (x, —middle) © (x, —middle); 
W,,Ct+1)=Gew, ,(t)+4W,,(t+1) 


a and u belong [0, 1]. 
Delt rule: 


out, =AF ; (%,, Bay oy x,)5 
OW, ;(t+1) =e (4, —Out,) © out, 
W, (t+1) =@ ow, ,(t) +4W,,(t+1) 


a and p belong to [0, |]. Through the use of an auxiliary 
node R,, threshold learning can be changed to weight 
learning. In this work, the threshold tr, = middle and 
Out,,, = |. When it is necessary to learn the threshold, the 
following rule may apply: Assuming AF, is the approxi- 
mate logic function of node i, then 


if AF, > min and x, = min, then tr,,.,, ; [i is subscript for 
new] = AF, (Xx,, X2, ..., %,) + w * max; 


if AF, = min and x, > min, then tr,,,, ; [i is subscript for 
new] = x,. 
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[Abstract] A Transputer-array-based General-Purpose 
Parallel Neural Network Simulation System (GP?N?S?) 
developed at USTC is described. This system provides 
the neural network simulation language, editor, com- 
piler, and user’s executive program environment. The 
user’s program, written in sequential code, can be auto- 
matically implemented in parallel on a parallel Trans- 
puter system. GP?N2S? can simulate not only several 
currently popular neural networks, but also new neural 
network models independently developed by the user. 


The GP?N?S? hardware configuration and system soft- 
ware module schematic are shown below in Figures | 
and 3, respectively. In Figure 3, PEM is the program 
editing module, PCM is the program compiling module, 
CSM is the control simulation module, GDM is the 
graphics display module, ARCM is the array reconfigu- 
ration module, ALM is the array loading module, SCM is 
the simulation command module, NAM is the neuron 
allocation module, SIDSM is the sampling input and 
data sending module, SRM is the simulation results 
module, PCRTCM is the personal computer host to root 
Transputer communications module, NC is the network 
communications subprocess, SC is the simulation com- 
putation subprocess, and DA is the data access subpro- 
cess. Figure 2, not reproduced, shows the system soft- 
ware structure. Figures 4 and 5, reproduced below, show 
the usual Transputer array configuration and the 
GP?N2S? Transputer array configuration, respectively. 
























= hard 
wire 
hi soft 
wire 
Host: iBM-PC Transputer 
parallel array 


Figure 1. GP?N?S? Hardware Configuration 





In the usual configuration, the IBM PC 386 host is 
connected to a T800 host Transputer, which is connected 


JPRS-CST-93-004 
3 March 1993 


in turn to other T800 Transputers including the root 
Transputer; in the GP?N?S? configuration, however, the 
host is connected directly to the T800 root Transputer, 
which in turn is connected to the three other T800 
Transputers in the four-Transputer array. 


GP?N2S? comes with a Neural Network Description 
Language (NNDL) and Occam 2, can simulate 12,000 
neurons, permits over 300,000 connection weights, and 
has a processing speed of 80,000 IPS (interconnection 
weight updates per second); it can simulate a 10-city TSP 
(Traveling Salesman Problem) in |2 seconds. 






































'Main menu 

PC server Run within 
management L_PEM the PC 
software 

Transputer Run within the 

simulation Transnouter 

software 

Figure 3. System Software Module Schematic 


Host COMPUCEL Host Transputer Root Transputer 


Figure 4. 
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Usual Transputer Array Configuration 
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Traosputer array 


Figure 5. 
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cp2n2s2 Transputer Array Configuration 
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[Abstract] Advanced (i.e., high-level) Neural Network 
Description Language (NNDL) is a non-procedural lan- 
guage—one therefore quite different from traditional 
(procedural) program design languages such as Fortran 
and C—provided by GP?N?S? and specially designed for 
writing neural network simulation programs at the net- 
work, layer, and node levels. Several distinctive features 
of NNDL and its editor/compiler are described. 


Although the article has no numbered figures or tables as 
such, a programming example is provided in which the 
BP (backpropagation) algorithm is used to solve a mir- 
ror-image symmetry problem. This problem is described 
as follows: if the input binary sequence is center- 
symmetric, then the output is a 1, otherwise it is 0. For 
instance, if the input is 011110, the output is 1, but if the 
input is 011000, the output is 0. Using 6-bit binary 
sequences, the mirror-image symmetry problem is coded 
with the following program: 


In the diagram, the first six nodes in the input layer are 
the input binary sequence, while the final two nodes are 
the virtual nodes, representing the input-layer threshold 
and intermediate-layer threshold, respectively; both are 
fixed as an input of -1. 
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The network topological structure is 
shown in the fcllowing diagram: 
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[Abstract] GP?N?S?’s core software—the parallel simu- 
lation controller (PSC)—is introduced, and its design 
concept and implementation are described. The PSC 
includes three parts: the NC (sub)process, the DA (sub- 
process, and the SC (sub)process, all three executed in 
parallel within the Transputer array. 


One figure (not reproduced) shows the Transputer array. 
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[Abstract] The Central Control Block (CCB) is the center 
of both the data stream and control stream of GP?N?S?, 
which is a MIMD [multiple instruction stream/multiple 
data stream] system. The CCB handles system data 
collection, processing, sending, tasks assignment, etc. 
The implementation of the CCB is detailed, its task 
assignment strategy is discussed, and the system data- 
gram protocol related to the CCB implementation is 
described. 


The GP?N?S? system software consists of five large 
modules: the compiler module (NNC), the CCB, the 
PSC, the algorithm library (LIB), and the dynamic 
graphics simulator (part of the System Integrated Envi- 
ronment, or SIE), as shown in Figure | below. The CCB 
itself consists of three parts, labeled CCB1, CCB2, and 
CCB3 in the figure. Table | (not reproduced) lists NNC 
Output data and corresponding NIP (Network Informa- 
tion Package) data and NSP (Network Structure 
Package) data, while Table 2 (not reproduced) lists NTP 
(Network State Package) and DIP (Dynamic Informa- 
tion Package) data. 


—<<€ control stream 
---—~@ data stream 








LIB 


Figure 1. GP?N?S? System Software Structure 
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[Abstract] The GP?N?S? algorithm library (LIB) includes 
several popular neural network algorithms, such as BP 
(Backpropagation), Kohonen, and Hopfield. These have 
been employed to solve some real problems such as the 
TSP, Associative Memory (AM) and mirror symmetry. 
Also, the dynamic code loading method has been used to 
implement simulation calling for user-defined algo- 
rithms. 


Ten figures (not reproduced) show a BP network, a 
single-node calculation flow chart for a BP algorithm, a 
mirror-symmetry network structure, Rumelhart’s simu- 
lation results, USTC’s simulation results, a Kohonen 
network, a single-node calculation flow chart for the 
Kohonen algorithm, a Kohonen network used to solve 
the TSP, standard samples of numerals for NN recogni- 
tion, and the simulation results for noisy numeral input 
samples. 
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Integrated Environment, Dynamic Graphical 
Simulator 
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[Article by Qin Xiaoou [4440 1420 7743] and Chen 
Guoliang of the Dept. of Computer Science, USTC, 
Hefei 230027: “Design, Characteristics of GP?N?S? 
Integrated Environment, Dynamic Graphical Simula- 
tor,”’ supported by grants from the State S&T Commis- 
sion’s Basic Research and High Technology Dept. and 
the 863 Plan; MS received | Jun 92] 


[Abstract] The GP?N?S? system integrated environment 
(SIE) provides users with a common interface supporting 
editing, compiling, and running. The dynamic graphical 
simulator which realizes the dynamic procedure simula- 
tion of artificial NNs is driven by and belongs to this 
environment. The design and features of the two parts 
are discussed. 


Two figures, both reproduced below, show the architec- 
ture of the SIE and the structure of the dynamic simula- 
tion controller. 
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On Design, Analysis of MPLPC [Multi- 
Pulse-Excitation Linear Predictive Coding] Vector 
Quantizer Based on Artificial Neural Network 


40100052A Beijing TONGXIN XUEBAO [JOURNAL 
OF CHINA INSTITUTE OF COMMUNICATIONS] 
in Chinese Vol 13 No 5, Sep 92 pp 3-10 


[English abstract of article by Xu Bingzheng and Peng 
Lei of South China University of Technology, Guang- 
zhou; MS received 18 Oct 91] 


[Text] The application of neural networks in speech 
compression encoding is discussed. A vector quantizer 
based on an artificial neural network is provided which 
can be used in the quantization of multi-pulse excitation 


network is somewhat similar to Kohonen’s net. It per- 
forms the parameter analysis and quantizing process 
together. Compared to the traditional method which 
performs analysis first and then quantization, it has 
some excellent properties. We provide the architecture 
and learning rule of the quantizing network, and com- 
pare it with the traditional method in the implementa- 
tion. Finally, we simulate the quantizing network with 
practical speech signal. The experimental results show 
that our scheme is feasible. 


Modified Kohonen Self-Organizing Neural 
Network, Adaptive Vector Ouantization of Images 


40100052B Beijing TONGXIN XUEBAO [JOURNAL 
OF CHINA INSTITUTE OF COMMUNICATIONS] 
in Chinese Vol 13 No 5, Sep 92 pp 16-21 


[English abstract of article by Wang Wei, Cai Dejun, and 
Wan Faguan of Huazhong University of Science and 
Technology, Wuhan; MS received 14 Oct 91] 


[Text] Based on discussion of the principle of Kohonen’s 
self-organizing feature maps (SOFM), a modified SOFM 
(MSOFM) algorithm is proposed to reduce blocking 
effect of vector quantization of images. Two eigenvalues 
are designed in DCT (Discrete Cosine Transform) 








domain to classify image blocks, then the application of 
MSOFM algorithm in adaptive vector quantization is 
discussed. The results of computer simulation show that 
the MSOFM training algorithm significantly reduces 
blocking effect and has better performance than the 
SOFM algorithm. 


New Models of Holographic Network, Hamming 
Net, Their Application in Handwritten 
Chinese-Character Recognition 


40100052C Beijing TONGXIN XUEBAO [JOURNAL 
OF CHINA INSTITUTE OF COMMUNICATIONS] 
in Chinese Vol 13 No 5, Sep 92 pp 54-59 


[English abstract of article by Yu Yinglin and Deng Da 
of South China University of Technology, Guangzhou; 
MS received 16 Mar 92] 


[Text] We propose two new methods, one for holo- 
graphic memory, another of a fast-converging Hamming 
net, both with an automatic attention moving function. 
Satisfactory results have been obtained after the con- 
struction of a handwritten Chinese-character recognition 
system using these new models. 


Applications of Neural Network for Handwritten 
Digit Recognition 

40100052D Beijing TONGXIN XUEBAO [JOURNAL 
OF CHINA INSTITUTE OF COMMUNICATIONS] 
in Chinese Vol 13 No 5, Sep 92 pp 60-64 


[English abstract of article by Wang Minghui, Pan 
Xinan, and Shen Min of Beijing University of Posts and 
Telecommunications; MS received 2 Mar 92] 
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[Text] A feedforward multilayer neural network with 
backpropagation learning algorithm to recognize hand- 
written digits written by 40 persons is used. First, a HP 
scanner converts the original digit images to binary 
images. Then, some additional preprocessing is per- 
formed to segment and normalize the digits, the binary 
images are scaled to a 32 x 32 pixel matrix, and the 
features are extracted to change the representation of a 
digit from a pixel matrix to a feature description. Finally, 
a result of 0.4 percent error rate at 25 percent reject rate 
in the computer simulation is achieved. Some problems 
encountered in the experiment are also discussed. 


Learning Algorithm for Speech Recognition With 
Recurrent Neural Network 


40100052E Beijing TONGXIN XUEBAO [JOURNAL 
OF CHINA INSTITUTE OF COMMUNICATIONS] 
in Chinese Vol 13 No 5, Sep 92 pp 76-79 


[English abstract of article by Li Haizhou and Xu Bing- 
zheng of South China University of Technology, Guang- 
zhou; MS received 2 Nov 91] 


[Text] Learning to associate static input/output pairs can 
be accomplished with layered connectionist networks 
with feedforward links alone, but feedback links are 
required to provide the network with state sequence 
information, in order to capture sequential behavior. In 
this paper, a multilayer network architecture with 
dynamic neurons which have multiple local feedbacks is 
built. The network proposed can be trained to memorize 
sequential patterns. A new learning algorithm is also 
derived which is more effective and easier to implement. 
Finally, some experiments on speech recognition of 
Chinese numbers are designed to explore the capabilities 
of proposed networks to learn dynamic properties of 
time-varying data. The performance of dynamic neurons 
with different time delay periods is also shown. 
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